Each instance of a Type with a TypeKind of “Character” is what Unicode terms a “code unit”. Note that there can be multiple “code units” for a “code point” (each letter, punctuation mark, symbol etc in a system is typically associated with a “code point”).

A “code unit” is typically a fixed number of bytes. The Endianness is format dependent – for example, in MXF: UTF-16 “code units” are Big Endian (“UTF-16BE”).

The Definition field must include the official IANA character set “Name” (see http://www.iana.org/assignments/character-sets/character-sets.xhtml). This can be interpreted as a statement of both:

  • the allowed “code points” that can be represented by data instances of this Type
  • the byte encoding of data instances of this Type – this will only be relevant to certain implementations (for example: it would be relevant for an MXF file because each data instance is KLV encoded as a sequence of bytes. However, it would not be relevant for a Reg-XML document because all data is encoded as text anyway – the method used to encode the entire Reg-XML document as bytes is described independently at the top of the Reg-XML document).

Note that it is essential to read and interpret the Definition field to understand how each Type with a TypeKind of “Character” is to be handled – the entry in the Types Register does not provide any way for this to be reliably signalled in a machine-readable way.

Note that the definitions of the “Character” TypeKind given above, in AAF, and in Reg-XML, are all slightly different.